Quantum Time-Space Tradeoffs for Sorting * 



Hartmut Klauck^ 
School of Mathematics 
Institute for Advanced Study 
Princeton, NJ08540, USA 
klauckOias . edu 



Abstract 

We investigate the complexity of sorting in the model of sequential 
quantum circuits. While it is known that in general a quantum algorithm 
based on comparisons alone cannot outperform classical sorting algorithms 
by more than a constant factor in time complexity, this is wrong in a 
space bounded setting. We observe that for all storage bounds n/logn > 
S > log 3 n, one can devise a quantum algorithm that sorts n numbers 
(using comparisons only) in time T = 0(n 3//2 log 3//2 n/y/S). We then 
show the following lower bound on the time-space tradeoff for sorting n 
numbers from a polynomial size range in a general sorting algorithm (not 
necessarily based on comparisons): TS = Í7(n 3 ^ 2 ). Hence for small vàlues 
of S the upper bound is almost tight. Classically the time-space tradeoff 
for sorting is TS = Q(n 2 ). 



1 Introduction 

Sorting is arguably one of the most important and well-studied problems in 
computer science. While any comparison-based algorithm needs f2(nlogn) time 
to sort n numbers, and several algorithms achieving a matching running time 
are well known, the situation changes if we are confronted with the problem to 
sort when the list of data is too long to fit into the memory of our computer. 
In this setting we are given a bound S(n) on the available memory and have 
to sort with an algorithm whose space requirements do not exceed this bound. 
This situation arises e.g. if data are stored distributed on the internet and are 
too large to be held in the memory of the computer at a single site. 

After prior results by Borodin and Cook in |7|, Beame [3] proved a lower 
bound of TS = 0(n 2 ) for any classical algorithm sorting n numbers. A match- 
ing upper bound for sorting on a RAM has been proved in ^Ej) f° r an space 
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bounds logn < S < nj logn. This upper bound is actually achieved by an algo- 
rithm that accesses the data only using comparisons. This is remarkablc, sincc 
sometimes operations on the numbers itself can speedup sorting, e.g. Radixsort 
(see e.g. can sort polynomial size numbers in linear time, while comparison 
based algorithms need time íí(nlogn) for this task. So the classical time-space 
complexity of sorting is well known. Interestingly, while Beame |3| investigates 
the Unique Elements problem (output the list of elements appearing exactly 
once in the input), Borodin and Cook [7] consider the Ranking problem (out- 
put the permutation needed to sort) and show TS — Q(n 2 / logn) for numbers 
from a quadratic size range. 

Quantum computing is an active research area offering interesting possibili- 
ties to obtain improved solutions to information processing tasks by employing 
computing devices based on quantum physics, see e.g. JSj for a nice introduc- 
tion into the field. It has been shown by H0yer et al. in ^3], however, that 
any comparison-based quantum sorting algorithm needs time í2(nlogn), which 
is already achieved by classical algorithms. So is there no point in using quan- 
tum computers to sort? We demonstrate that in the space bounded setting the 
quantum complexity of sorting is quite different from the classical complexity. 

We use the model of quantum circuits to investigate time-space tradeoffs. 
While in the classical setting branching programs are the Standard model to 
consider these problems, employing quantum branching programs seems to com- 
plicate the definition of our model unnecessarily. A quantum circuit uses space 
S, if it operates on S working qubits. Furthermore it has write-only access to 
several output qubits, since this output is typically too large to be held in the 
working memory during the whole computation. The circuit accesses the input 
by specific oracle gates. For the upper bound we use a comparison oracle, i.e., 
an oracle gate gets (superpositions over) two indices of numbers to be compared 
and outputs a bit indicating the comparison result into an extra qubit. For the 
lower bound we choose to restrict the size of the numbers in the input to n 2 , and 
here we consider an access oracle, that directly reads numbers from the input 
into the work space. 

Using Grover's famous quantum search algorithm and its adaptation to min- 
imum finding by Dürr and H0yer in |12| we can show the following: 

Theorem 1 For all S in [f2(log 3 n), . . . , 0(n/ logn)] there is a quantum circuit 
with space S that, given a comparison oracle for n numbers, outputs the sorted 
sequence, and uses time 0(n 3 ^ 2 log 3 ' 2 n/\/S). The entire output is correct with 
probability 1 — e for an arbitrarily small constant e > 0. 

If we restrict the size of the numbers, we can use the same algorithm while 
employing an access oracle. 

Corollary 1 For all S in [f2(log 3 n), . . . , Oinj logn)] there is a quantum circuit 
with space S that, given an access oracle for n numbers from {1, . . . ,n k } for 
some constant k, outputs the sorted sequence in time 0(n 3 / 2 log 3/2 n/y/S). The 
entire output is correct with probability 1 — e for an arbitrarily small constant 
e > 0. 
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The quantum sorting algorithm clearly beats the classical time-space tradcofF 
lower bound. Actually it gives rise to a tradeoff T 2 S = 0(n 3 log 3 n). It performs 
best in comparison to classical algorithms whenever S is small. E.g. for S — 
\og k (n) the running time is Oiv?l 2 / log^ fc ~ 3 ^ 2 n), a clear improvement compared 
to the classical bound fí(n 2 /log fc n). 

We then investigate the question whether one can do even better. First 
let us fix an output convention for sorting. Denote by Min(x, i) the ith num- 
ber of the sorted sequence. Let us assume an algorithm outputs a sequence 
(í/i, ji), • ■ • , (y n , jn), so that the set . . .,j n } equals {1, . . . , n}. Since we 
considcr quantum algorithms we will allow errors. Furthermore we will require 
that the ji are produced in the same order for all inputs. Our main result is the 
following lower bound. 

Theorem 2 Let A be any quantum circuit that, given an access oracle for a 
sequence x of n numbers from {1, . . . , n 2 }, outputs n pairs Oi(x), . . . , O n (x), so 
that the probability that Oi(x) = (Min(x,ji),ji) is 2/3 for each i and the ji are 
a fixed permutation of {1, . . . , n}. Suppose A uses S work qubits and T oracle 
gates, then ST = Q(n 3/2 ). 

Note that we do not even require that the whole output sequence is ever 
simultaneously correct. 

Hence for small S we cannot substantially speed up the quantum sorting 
algorithm. E.g. for S — poly (log n) the necessary and sufficient time to sort on 
a quantum computer is 9(n 3 / 2 ). 

Technically the lower bound proof proceeds as follows. A given quantum 
circuit is sliced into parts containing only S^/n queries. Each such slice starts 
with some initial information stored in the S qubits that has been computed 
by the previous slices. We show how to get rid of this initial information while 
deteriorating the success probability only exponentially in S. This step is similar 
to an application of the union bound (classically the success probability for one 
of the 2 S fixed states of the S bits given in the beginning must be 1/2 S times 
as good as the success probability with initial information). We show that 
something similar is possible in the quantum case. 

Then we are left with the problem to analyze the success probability of a 
slice of the circuit without initial information, and show that it can be only large 
cnough, if the number of outputs is small. Since there have to be n outputs this 
leads to a lower bound on the number of slices and hence the time complexity. 

We also obtain a tight analysis for classical sorting in the case that all the 
numbers are known to be distinct. No tight analysis of Ranking seems to 
have appeared since (only a modest improvement in and the Unique 

Elements problem is not fit to provide lower bounds in the situation when all 
the inputs are known to be distinct, which they are with high probability when 
drawn uniformly from a large range. 
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2 Definitions and Preliminaries 



In this section we give some background on quantum states and their distin- 
guishability, define the model of quantum circuits we are studying, describe a 
quantum version of the union bound, and discuss lower bounds on query com- 
plexity by query magnitude arguments. For more quantum background see 

2.1 Quantum States 

The quantum mechanical analogue of a random variable is a probability distri- 
bution over superpositions, also called a mixed state. For the mixed state X = 
{pi, \ 4>i)}, where \4>i) has probability pi, the density matrix is defined as px — 
^2íPí \4>í){4>í\- Density matrices are Hermitian, positive semidefinite, and have 
trace 1. I.e., a density matrix has real eigenvalues between zero and one, and 
they sum up to one. 

The trace norm of a matrix A is defined as \\A\\ t = Tr V A^A, which is the 
sum of the magnitudes of the singular vàlues of A. Note that if p is a density 
matrix, then it has trace norm one. 

A useful theorem states that for two mixed states p\ , pi their distinguisha- 
bility is reflected in || pi — p2 || t PQ: 

Fact 1 Let pi,p2 be two density matrices on the same space 7í. Then for any 
measurement O, 



where p denotes the classical distribution on outcomes resulting from the 
measurement of p, and \\ ■ 1^ is the t\ norm. Furthermore, there is a measure- 
ment O, for which the above is an equality. 

2.2 Quantum Circuits 

We now define quantum oracle circuits. The two parameters we are interested 
in are the number of work qubits corresponding to the space bound, and the 
numbcr of queries, which is always smaller than the overall number of gates, 
corresponding to the time. 

A quantum circuit on S work qubits and M output qubits is defined as an 
ordered set of gates, where each gate consists of a unitary operation on some 
k qubits and a specification of these k qubits. The gates either operate on the 
work qubits only, or they are output gates that perform a controlled-not, where 
the control is among the work qubits, and the target qubit is an output qubit. 
We require that each output qubit is used only once. In this way the output 
qubits are usable only to record the computation results. 

Another type of gates are query gates, and they are needed to access the 
input. In general a query gate performs the following operation for some input 
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for all classical strings i,a, where the length of i is logn and the length of a 
is k. The behavior of a qucry gatc on supcrpositions is defincd by lincarity. 
In a computation all qubits start blank, and then the gates are applied in the 
order defined on them. For sorting we consider two different types of input 
oracles. In a comparison oracle the algorithm is allowed to query an n 2 -bit 
string that contains the results of the pairwise comparisons between n numbers. 
In an access oracle to input x\, . . . , x n the query gate directly reads numbers Xi 
(or their superpositions). Note that an access oracle can efficiently simulate a 
comparison oracle if the numbers to be sorted are from a polynomial size range. 

We require that a sorting circuit makes outputs in the following way: if the 
input is presented as a comparison oracle, then the algorithm is supposed to 
output the inverse of the permutation needed to sort. If the input is given as an 
access oracle, then the output is given as a list of numbers with their positions 
in the sorted sequence. We require that a numbcr is output as a whole at some 
point in the algorithm, i.e., there are k one bit output gates directly following 
each other when a number from a 2 k range is output. 

Let us consider the output convention more closely (in the case of an access 
oracle). For some input x let Oi denote the ith output, and Oi{x) the probability 
distribution obtained by measuring this output. This is a distribution on pairs 
(j/í, ji). Let Min(x,i) denote the ith smallest element of the sorted sequence in 
the input, then the pair (Min(x, ji), ji) is supposed to be produced at output 
Oi. The sequence (ji) is independent of x. We say that Oi(x) is correct with 
probability 1 — e, if Oi(x) = (Min(x, ji), ji) with probability 1 — e. We require 
in our lower bound theorem that each Oi(x) is correct with probability 2/3. 

A circuit is said to have tirat complexity T, if the number of gates is T (in 
the lower bound we will only count the number of queries). 

2.3 A Quantum Union Bound 

In this subsection we want to develop a simple tool needed to take away the 
initial information in a slicc of the sorting circuit. 

Let p denote the density matrix of a state on m qubits. We want to replace 
p by the completely mixed state, while retaining some of the success probability 
of an algorithm taking p as an input. The following lemma will be helpful. 

Lemma 2 Let p be any density matrix on m qubits. Let M denote the density 
matrix of the completely mixed state, i.e., the matrix with entries l/2 m on the 
diagonal and zeros elsewhere. Then there exists a density matrix a, so that 

M= l/2 m p+(l-l/2 m )a. 

Proof. We have to show that a — (M — p/2 m )/(l - l/2 m ) is a density matrix. 
For p being a density matrix, a is clearly Hermitian and has trace 1. So we have 
to show that a is positive semidefinite, which is equivalent to I — p being positive 
semidefinite, for the identity matrix /. Let U be some unitary transformation 
that diagonalizes p, i.e., D = U pU^ is diagonal. U exists, since p is Hermitian. 



5 



Clearly / — p is positive semidefinite iff / — D is, and D contains on its diagonal 
nonnegative numbers that sum to 1, hence / — D > 0. □ 

We want to apply the above lemma in the following way. 

Lemma 3 Suppose there is an algorithm that on some input x first receives S 
qubits of initial information depending arbitrarily on x, and that makes after- 
wards only queries to the input. Suppose the algorithm produces some output 
correctly with probability p. 

Then there is an algorithm that uses no initial information, makes the same 
number of queries, and has success probability p/2 . 

Actually the above lemma can be thought of as a quantum union bound. 
Note that the S qubits can be in as many states as there are inputs x, still 
removing them decreases the success probability by a factor exponentially in S 
only. 

Proof. Replace the quantum state containing the initial information by the 
completely mixed state M on S qubits. Then run the algorithm in exactly the 
same manner as before. Clearly the algorithm does not get initial information 
in this way. Due to Lemma[21the original state has some probability 1/2 S in M, 
respectively one can view M as a mixture of the original state with probability 
1/2 S and another state with the remaining probability. So also the outcome of 
the algorithm is such a mixture, and if the success probability was p originally, 
it must be at least p/2 for the modified algorithm. □ 

Note that the completely mixed state on S qubits can easily be obtained 
from a blank state on 25* qubits by performing Hadamard gates on the qubits 
1 to S and then controlled-not gates on the pairs i, S + i. 

Another way of obtaining a similar result is to use quantum teleportation 
(invented by see also ^Bj). In the teleportation scheme two players who share 
m EPR-pairs can communicate an arbitrary quantum state in the following way. 
If player Alice holds a quantum state p on m qubits, she applies measurements 
in the Bell basis to the m pairs of qubits given by one qubit from an EPR-pair 
and one qubit of p each. Bob, holding the m other qubits belonging to the 
EPR-pairs then gets a message from Alice containing her measurement results 
in 2m classical bits. He is then able to perforin certain operations on his qubits 
depending on the message, which enable him to recover p. 

It is known that for each p, the probability of each of the possible measure- 
ment results is exactly 4 _m . Furthermore for one of the measurement results, 
Bob does not have to do anything to his qubits to get p, i.e., with probability 
4 _m the state of Bob is correct already. Note that this implies that Bob's qubits 
before he receives the message, which are in a completely mixed state, can be 
viewed as an ensemble of states p\ , . . . , p±™ , where M = A~ m pi , and p\= p. 
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2.4 Lower Bounds for Query Algorithms 

In [5] an Í2(y / rï) lower bound for the problem of fmding a marked element in 
an unordered database of size n has been given, matching the upper bound 
of Grover's algorithm This lower bound relies on the notion of query 

magnitude. For other lower bound techniques for query complexity see e.g. |1U|. 

The query magnitude technique is basically an adversary argument. An 
adversary is able to change the black-box input without the query algorithm 
noticing that (for a more refined type of quantum adversary argument see 0). 
We use the following statement derived as Corollary 3.4 in pj. 

Fact 4 Let x — x±, . . . , x n be an input given as an access oracle, with Xi from 
some finite set, and let x'(i) be any input that differs from x in position i and 
nowhere else. A is any quantum algorithm that accesses the oracle via at most 
T queries. The state p x denotes the final state of A 's workspace when querying 
x, the state p x >(i) when querying x'(i). 

Then for any a > there is a set of at least n — T 2 / a 2 input positions i such 
that for all x'{i): || — p x '(i) || t < 2a. 

In our lower bound we will need a somewhat stronger statement that allows 
us to deal with a situation conditioned on results of previous measurements. 

Lemma 5 Let x — xi,...,x„ be an input given as an access oracle, with xi 
from some finite set, and let x' {i) be any input that differs from x in position 
i and nowhere else. A is any quantum algorithm that accesses the oracle via 
at most T queries. Suppose A contains no measurements, but at the end a 
measurement is performed on some of the qubits. Fix some outcome F of this 
measurement that occurs with probability q x . Assume that some event E happens 
with probability p x conditioned on F . 

Then for any a > there is a set of at least n — T 2 jo? input positions i 
such that if A is performed on x'(i), then the probability that F is the outcome 
of the measurement and E happens is at least q x (p x — a). 

Proof. Let U x and f7 x '(i) denote the unitary transformations done by the cir- 
cuit A on inputs x and x'(i). W.l.o.g. A starts from the blank state |0) on 
some qubits. Let \<j) x ) denote the state obtained by performing U x \0), and 
measuring the resulting state, when F is the result of the measurement. Let 
\Í>x) = U x 1 \(f) x }. Clearly there is a state \9 X ) so that |0) = 7i|V'x) +72^2;), and 
|7i| 2 = Qx, and (9 x \ip x ) = 0. We can apply Fact0]to A running on state \ip x ), 
and get the required number of x'(i), so that the obtained states are close to 
each other, i.e., E happens with probability p x — a when U x i^ is applied to 
l'tpcc). We are interested in the joint probability of events E, F when U x i^ is per- 
formed on |0) = 7i|V'a;) + l2\0 x ). Clearly this probability is at least q x {p x — a). 

□ 
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3 A Sorting Algorithm 



In this section we describe the algorithm needed to prové Theorem^ For this 
upper bound we identify time complexity with the number of constant fan-in 
gates needed to build the circuit (queries still count as one gate). 

The algorithm iterates minimum finding and uses the following result de- 
scribed by Dürr and H0yer C2 based on Grover's famous search algorithm |13|. 

Fact 6 There is a quantum query algorithm that, given a comparison oracle to 
n numbers, finds the minimum of these numbers with probability 1 — e, and uses 
0(^/nlog(ï/e)) queries (and gates) and space 0(log 2 nlog(l/e)). 

Let S be the space bound. Then x — x\, . . . ,x n can be partitioned into 
ò = 5/(clogn) blocks y 1 of 0(n\ogn/ S) numbers each (for some large enough 
constant c). Here is the algorithm: 

1. FOR i := 1 to 6 compute the position of the minimum of y % and store it 
together with i. 

2. Arrange these minima positions as a Heap ordered by the minima's size 
(see d]). 

3. For i := 1 to n DO 

(a) Output if the minimal number among the block minima is Xj. 
(Jo) Remove j and its block number k from the Heap. 

(c) Find the position of the minimal number xi larger than Xj in y k . 

(d) Insert (l, k) into the Heap. 

Note that all these operations can be performed with a comparison oracle. 

Steps 1. and 3.c) employ the minimum finding algorithm of Fact[ü] To ensure 
correctness in step 1. that algorithm is used with error bound 1/S 2 , hence the 
time for step 1. is 0(S/log(n) ■ ^Jn ■ log(n)/5 • logS) = 0(^n\og{n)S). 

Step 2. can be done in time O(S), steps 3.a) and 3.b) need time O(logn), 
and step 3. d) needs time O(logS'logn) in each iteration 

In step 3.c) the algorithm is used with error í/n 2 . Then the running time 
for 3.c) is overall 0(n ■ ^fn-\og{n)/S ■ logn) = 0{n 3 / 2 log 3 12 \n)/y/S). 

The time spent in the other steps is dominated by the time used in step 3.c), 
if S = 0(n/ logn). 

Note that for reusing the 0(log 3 n) qubits of storage needed by the minimum 
finding algorithm in each iteration, it is understood that this algorithm is used 
in the following way. It is run in the usual way, with measurements deferred to 
the end. Then (instead of measuring) its output is copied to some qubits using 
controlled-nots. Afterwards the minimum finding algorithm is run backwards to 
clean up the storage it has used. Since the error of the algorithm is small enough 
this leads to an algorithm with overall error bounded by 0(l/n), compare 
for details on how to run subroutines on a quantum computer. 

So overall the storage bound is not violated. Hence the algorithm behaves 
as announced. 



8 



4 The Lower Bound 



Now we give the proof of Theorem[3] After some simple preparations we show 
how to decompose a quantum circuit into slices that contain only few oracle 
gates but must (on average) produce many outputs. Then, in our main lemma, 
we give an upper bound on the number of outputs such a slice can give. In the 
rest of this section we prové that lemma. 

4.1 Preparations 

Let A be a quantum circuit with T oracle gates and S work qubits. A contains 
n output operations, each of which writes on 31ogn of the output qubits. If 
one of these outputs is measured in the Standard basis, then with probability 
2/3 a pair (Min(x,ji),ji) is produced, and the ji form a fixed permutation of 
{!,-••, n}. 

First we note that the success probability can be improved in some sense. 

Lemma 7 The success probability can be improved to 1 — e for any constant 
e > without changing S, T by more than a constant factor, at the expense of 
adding a circuit consisting ofO(logn) qubits and a majority computation to any 
output gate. 

Proof. We can use the circuit some l times "in parallel" . For each output first 
all l "parallel" outputs are mapped to some extra work storage (0(1 ■ logn) 
qubits), then an operator is applied to these that computes the most freqüent 
output and maps it to the real output qubits. By Standard arguments the error 
probability drops exponentially in l, and so l — 0(1) sufhces, hence the increase 
in space is a factor of l and an additive O(logn), the increase in time is a factor 
of l and an additive majority computation (poly(logn) gates) for each output, 
the number of queries goes up by a factor of l. □ 

Note that we cannot reuse the O(logn) extra qubits for diffcrent outputs, 
since even if we try to uncompute the computation on them due to the constant 
error the result will not be close to a blank state. The lower bound proof 
will enforce the space restriction only between slices of the circuit, so the above 
construction is still uscful. Altcrnatively we could consider a more general circuit 
model, in which "fresh" qubits may be added any time and qubits may be 
" thrown away" . In such a model the space restriction would refer only to the 
maximal number of qubits used at the same time. 

Due to our output convention now each individual output is correct with 
probability 1 — e for some arbitrarily small constant e. But then many outputs 
must be correct simultaneously with high probability. 

Lemma 8 Assume each individual output is correct with probability 1 — e. Let 
R be any set of l outputs. Then with probability 1 — y/e at least (1 — \/(-)l of the 
outputs in R are correct. 
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Proof. If the probability that at least (1 — y/e)l of the outputs in R are correct 
simultaneously is less than 1 — y/e, then the expected number of correct outputs 
in R is less than (1 — y/e) ■ l + y/e ■ (1 — y/e)l = (1 — e)l, which is impossible due 
to the linearity of expectation. □ 

Hcncc with large probability at least a big fraction of the outputs are correct. 
4.2 Slicing Quantum Circuits 

Consider the following way to slice a given quantum circuit A on S qubits with 
T queries. Fix some parameter S. Slice A\ contains all the gates from the 
bcginning of the circuit up to the 5y/n-ih query gate. Slice Ai contains the 
next gates until the 28y/n-\L· query gate and so on. Overall there are M = 
\T/(8y/n)~\ slices. Note that each slice Ai is a quantum circuit that contains 
5 y/n queries, and that uses S qubits of work space which are initialized to some 
state depending on what was computed by the slices A±, A2, . . . , A;_i. Note 
that for the sorting problem the average number of outputs in a slice is n/M. 

In the following we consider the computational power of an individual slice. 
We will give an upper bound on the number of outputs a slice can make. The 
set of outputs we consider will be restricted to those which output one of the 
numbers smaller than the median, i.e., outputs for the largest n/2 numbers 
will not be considered. The inputs are of the form x — x\, . . . ,x n with all 
Xi G {1, . . . , n 2 } and X{ / Xj for all i / j. We can now state our main lemma. 

Lemma 9 (Main) Let A be any quantum algorithm that is initially given some 
S qubits in an arbitrary state depending on the classical input x, and that af- 
terwards accesses x via Sy/n oracle queries only. Assume that A produces l 
outputs 0\(x), . . . , Oi(x) for the numbers Min(x,ji), . . . , Min(x,ji) with ji < 
■ " < jl < n/2, and that for all x and i G {1, . . . , 1} the output Oi is correct with 
probability 1 — e. Then for S = 10~ 4 it holds that 

(1 - yfe) ■ 2- s ■ 2- l - H{ ^ < (0.99) (1 -^. 

Note that the space bound enters the main lemma only via the amount of 
initial information. The function H is the binary entropy function H(p) = 
—plogp — (1 — p) log(l — p). Let us now deduce Theorem[2]from the lemma. 

Proof of Theorem|5J Given is a circuit A. First apply Lemma[7|to reduce the 
error probability to e for some small enough constant e. Consider any slice Ai of 
the obtained circuit and assume that the slice makes some l outputs. Note that 
the logn qubits needed for reducing the error at each output are never reused 
and hence do not contribute to the initial information. If l — cS for some large 
enough constant c, then the lemma says 

1 - < 2 S • 2 cS - H{ ^ ■ (0.99) (1 -^ )c5 < 1/2. 

Contradiction! So the number of outputs in the slice is at most O(S). 

There are M < \T/(8y/n)~\ slices producing n/2 outputs, but the overall 
number of outputs is at most \T/(6y/n)] ■ cS, so TS = ü(n 3 / 2 ). □ 
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4.3 Proof of the Main Lemma 



The plan of the proof is to show hrst that in A for each x the expectation over 
all subsets containing V = (1 — y/e)l outputs of the probability that these are 
simultaneously correct is at lcast (1 — y/e) ■ 2~ s • 2~ l ' H (^\ even if the initial 
information is replaced by a completely mixed state. 

Then we show that in any algorithm for any V output positions the expecta- 
tion over inputs x of the probability that the outputs are simultaneously correct 
is at most (0.99)' . These two statements together imply the inequality in the 
main lemma. Set S = 10 . 

Lemma 10 Suppose an algorithm produces l outputs, and for all inputs x each 
output Oi(x) is equal to (Min(x,ji),ji) with probability 1 — e. Then for all 
inputs the expectation over all sets containing V outputs of the probability that 
these are simultaneously correct is at least (1 — y/e) ■ 2~ H (^ 1 . 

Proof. First apply Lemma [S] This lemma implies that with probability 1 — y/e 
at least V outputs are simultaneously correct. 
In other words for all x: 

Prob(3l' outputs Oi(x) = (Min(x,ji),ji)) > 1 - y/e, 

where the probability is over the measurements. This implies 

2 Prob(Vi : 0i (x) = {Min(x,j 0i ),j 0i )) > 1- yfe 

l<oi<...<o t ,<l 

and hence with ( ( - 1 _ í v ^j í ) < 2 lH ^~> the lemma follows. □ 

Assume that some V output gates are supposed to produce the numbers 
Min(x, k\), . . . , Min(x, kv) for some k\ < ■ ■ ■ < ky < n/2 and that these gates 
are Ok 1 , ■ ■ ■ , Ofc,, (renumbered for convenience) . 

Next we get rid of the initial information, at the expense of increasing the 
failure probability. We use Lemma |3J restated here in a more specific form. 

Lemma 11 Suppose there is an algorithm that uses S qubits of initial informa- 
tion and else makes only queries to the input x. Suppose the algorithm outputs 
a fixed set Min(x,ki), . . . , Min(x,ki') simultaneously correct with probability 
P x (h,...,ki>). 

Then there is an algorithm that uses no initial information, makes the same 
number of queries, and has success probability P x (k\, . . . , ki')/2 . 

So for all x the expectation over all subsets of /' outputs of the success 
probability is at least (1 — y/e)2~ s 2~ H (^' >l . For a contrasting statement we 
now consider any fixed set of V outputs and the expected success probability 
over all inputs x. At this point we are left with the following problem. We 
have a circuit that is supposed to output V numbers from the sorted sequence 
with some expected success probability P . The circuit accesses the input only 
via by/n queries. We have to show that P is exponentially small in V . This is 
established by the following lemma. 
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Lemma 12 For any algorithm that uses queries to inputs x and tries to 
output Min(x,ki), . . . ,Min(x,kit), the expectation (over all x) of the success 
probability is at most 0.99 1 . 

Note that this lemma together with the previous two lemmas immediately 
implies the main lemma. In the rest of this section we provide its proof. 

First note that we can assume that the algorithm never outputs the same 
number at two different output gates, since in this case there must be an error 
anyway and the respective outputs may be changed arbitrarily. 

Our plan is to employ an adversary argument like in Lemma [S] Therefore 
we need inputs where we can cheat well. Let K = {k\, . . . , ki*}. Fix some input 
x. Let R(x) — {ri(x), . . . ,r t ç x j(x)} denote a maximal set of positions from 
1, . . . , n/2 so that R(x) C K and Min(x, r i+1 (x)) — Min(x,ri(x)) > n/8 for all 
i. I.e., for each x we single out a set of positions so that the distance between 
the elements at these positions in the sorted sequence is almost as large as the 
average. 

Proposition 1 t(x) > V /2 with probability 1 — 1/2 Z I 2 . 
Proof. 

Prob(t(x) < V/2) 

< Prob{There are l' /2 k S K : Min(x, k) - Min{x, k - 1) < n/8). 

It is not hard to see that for all k 

Prob(Min(x, k) - Min(x, k - 1) < n/8) < 1/8 

even when conditioned on arbitrarily many events of the form Min(x, k') — 
Min(x,k' — 1) < n/8, since the probability that some intervals are short does 
rather decrease the probability that another interval is short. Hence 

Proò(There are V /2 k G K : Min(x, k) — Min(x, k — 1) < n/8) 

< (1/8)' 72 ■ ( z // 2 ) < 2-'A 

□ 

Denote by q l x the probability that the outputs O ri / x \, r2 / x ^, . . . , O r j x \ are 
correct on x. Also denote by p l x the probability that O n i x \ is correct (on x) 
conditioned on the event that the previous outputs are correct. Then q x +l — 
q x p x +1 . We are interested in bounding E[q l x ^ 2 ]. Let us simply neglect the inputs 
x for which t(x) < V /2 in the following. These contribute at most 1/2' I" 1 to the 
success probability due to the above proposition. We plan to show that there is 
a constant factor gap between E[q x ] and E[q x +1 ]. 

The next proposition states the main adversary argument. 
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Proposition 2 Let x and i be given. There is a set J x ^ C {1, ...,«} of size 
n/2 — Sn so that for each j G J X i there are 

Min(x, ri + \{x)) — Min(x, ri{x)) — 1 

inputs x'(j) so that with probability q l x (p x +1 —\/S) the outputs O r i(a;'(j)) > ^r 2 fa'(i)) > 
. . . , O rj ( x /(j)) are correct and O ri+l r x iuy> is incorrect on x'(j). 

Furthermore, each such x'(j) is "generated" in this way by at most n 2 inputs 

x. 

Proof. We want to apply Lemma El Let the condition F of that lemma be 
that the outcomes of the measurements for O ri ( x ) up to O ri r x -\ are correct. 
The event E is set to be that the measurement for O r i+1 ( x ) is correct, i.e., 
equals Min{x, r^ + i (x)). Clearly F occurs with probability q x on x and E occurs 
conditionally with probability p x +l . Then the lemma telis us that we may switch 
(1— S)n positions in x arbitrarily, and still get the same measurement results with 
probability q x {p x +1 -VS). To avoid changing the correctness of previous outputs 
we only fLip those positions containing numbers larger than Min(x, n/2). Thus 
we can flip a set J x ^ of at least n/2 — Sn positions. 

If we change Xj for j G J x ^ so that its new value a is between Min(x, Tí{x)) 
and Min(x,Ti + i(x)) — 1, then r\(x) — ri(x'(j)), . . . ,ri(x) — ri(x'(j)). Further- 
more the following happens. 

• Either a — Min{x, Tí{x)) < n/8, and then ri + i(x'(j)) = ri + i(x) + l. In this 
case with probability q x (p l x +1 — V~S) on x'{j) the first i outputs in R(x'(j)) 
are correct, and gate O ri+1 ( x ) outputs b — Min(x,ri + i(x)). Since the 
same number is never output twice the output on O ri+1 ^ x i^j)) is not equal 
to b with the same probability, which is an error. 

• Otherwise a — Min(x,ri(x)) > n/8 and so ri + i(x' (j)) = ri + \(x). In this 
case with probability q x (p l x +l — VS) on x'(j) the first i outputs in R(x'(j)) 
are correct, and the output O ri+1 i x iuy, is Min{x, Ti+iix)) ^ a, so again 
the first i outputs in R{x'{j)) are correct and the i + lst is not. 

Note that each x'(j) is derived from at most n 2 inputs x, since to change x'(j) 
to x we have to change one position to some other value. □ 

Now it is clearly true that 

E[qÍ}=E[qÍ.(p^ + l-pÍ+% 

and q x (í — p x +1 ) is the probability that the first i outputs in R(x) are correct 
and the i + lst is wrong. 

If Min(x, r i+ i(x)) — Min{x,ri(x)) > 8n, we simply use q x (l — p l x +1 ) > 0. 

Otherwise we estimate 

> max -X(PI +1 -VS) 

y:x=y>{]) 

> E ^l(pl +1 ~VS))/n 2 

y.x=y'(j) 



13 



due to the above proposition, wherc the notation x — y'{j) indicates that x 
can be derived from y by a change of the position j with Xj — Min(x, r^ + i (x)) 
as described above. Inserting these estimates and rearranging the terms in the 
expectation so that the errors are accounted for together with the inputs the 
adversary changes (rather than the inputs resulting from the changes) we get 



E[qí] > E 



\VT + JAvT VS) ■ (n/2 Sn) ■ 



where D(x,i) = va\n{AIm(x,r i+ i(x)) — Min(x 1 r i (x)) — l,8n}. Therefore, re- 
calling that 

Min(x,Ti + i(x)) — Min(x,ri(x)) — 1) > n/8 



we get 



E[qÍ] + 1/2 • VSE[ql ■ 8] > E[q^] + (1/2 - 8) ■ E[q^ ■ 1, 



Consequentially E[q l x ] ■ .98 > E[qï +1 ], and this holds for i = 1, ... , l'/2 (neglect- 

ing at most a l/2~''/ 2 fraction of all inputs), so we have E[q l x ^ 2 ] < 2~ 1 ' / 2 +.98 1 ' / 2 . 
Hence the expected probability of getting 0\, . . . , Oy correct is at most .99' for 
large enough V . 



5 Conclusions and Open Problems 

We have shown that quantum computers, though in general not much faster 
for the important task of sorting, outperform classical computers significantly 
in space bounded sorting. This setting is motivated by applications with dis- 
tributed data too large to fit into the working memory of a single computer. 
Furthermore understanding the complexity of such a bàsic problem is important 
in itself. We have seen that for sorting a tradeoff result of the form T 2 S < 0(n 3 ) 
exists, as opposed to the classical tradeoff TS — Q(n 2 ). Exploring the question 
whether the upper bound on quantum sorting is tight we have proved a lower 
bound of TS = Í7(n 3 / 2 ), showing that for small space bounds the algorithm is 
not too far from optimal. This lower bound actually holds in the average case 
sense, i.e., for random sets of n numbers from a n 2 range. 

Our result can easily be adapted to give the bound ST = Q(n 2 ) for classical 
sorting in the situation that all input positions are known to hold mutually 
distinct numbers, by considering circuit slices of length Sn, and using simple 
adaptations of Lemma|3]and Fact^to the classical case. 

Corollary 2 Any classical sorting algorithm with time T queries and space S 
that sorts numbers under the condition that the input contains mutually distinct 
elements only, needs TS — Í7(n 2 ). 

The best previous bound for sorting under the promise that all numbers are 
mutually distinct is TS — Vt(n 2 ■ loglogn/logn) given in |19| . 
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The most important open problem of this paper is, whether the lower bovmd 
can be improved to (almost) match the upper bound, which is what we con- 
jecture. To prové such a result it seems one should show that circuit slices 
containing \fnl queries cannot produce more than 0(1) outputs. To do so it 
would suffice to show that any quantum algorithm with óVnï queries that tries 
to compute l elements of the sorted sequence succeeds only with probability 
2 -e(0. 

It is also of interest what the time-space complexity of sorting is if we allow 
no errors. In this situation approaches based on Grover search fail and it is well 
possible that the same tradeoff as classically holds. 

Consider the element distinctness problem, i.e., deciding whether n given 
numbers are all pairwise different. It is conjectured that for classical computers 
the element distinctness problem has about the same complexity as sorting, 
and is thus a decision problem capturing the difhculty of sorting. Buhrman 
et al. describe in [§] a quantum algorithm that runs in sublinear time, and 
a slight variation of their algorithm achieves a tradeoff of T 2 S — 0(n 2 ) for 
deciding element distinctness. Hence in the quantum case element distinctness 
is strictly easier than sorting due to the lower bound for sorting given in this 
paper, consider e.g. the case that S < poly(\ogn). Can a matching lower bound 
be shown for element distinctness? Is element distinctness really as hard as 
sorting classically? Note that strong classical tradeoffs are known for element 
distinctness if only a comparison oracle is used [El [^, but the best tradeoff 
known for general models is T — Çl(nlog(n/S)), given in 0], no better product 
tradeoff than ST = Q(n\og 2 n). A quantum query lower bound of í!(n 2 / 3 ) has 
recently been shown by Shi |2()| . 

Another open problem concerns the query problem we have analyzed. We 
have shown that finding l elements of the sorted sequence within S^/n queries is 
possible with exponentially small success probability only. More generally, sup- 
pose one is given l instances of a query problem, and is allowed to do a numbcr 
q of queries that is known to give the result for one instance with probability 
p < 1 only, on average over all inputs. How large is the success probability of 
computing correctly on l instances? In other words, does a direct product re- 
sult hold for quantum black-box algorithms? Such theorems for classical query 
algorithms are given in (151 117| . 
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